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We present a new way to define and compute the maximum significance achievable for signal and 
background processes at the LHC, using all available phase space information. As an example, we 
show that a light Higgs boson produced in weak-boson fusion with a subsequent decay into muons 
can be extracted from the backgrounds. The method, aimed at phenomenological studies, can be 
incorporated in parton-level event generators and accommodate parametric descriptions of detector 
. . . effects for selected observables. 

; 

The Large Hadron Collider (LHC) will have a tremendous capacity to search for new particles, such as the Standard 
fNj , Model Higgs boson or new particles suggested by various scenarios for physics beyond the Standard Model. For such 
; I ' searches, it is important to asses the experimental sensitivity, which requires a description of the experimental search 
^ [ technique to isolate signal-rich data. Traditionally, this has been accomplished by using ad hoc kinematic cuts. At 
^> • the parton-level this process of designing cuts by hand to isolate signal-enhanced phase space regions (which emulates 
■ the traditional experimental practice) is not necessary. In this paper we present a new method of computing the 
statistical significance of a hypothesized signal via direct integration of the likelihood ratio. This technique does not 
' require identification of powerful discriminating variables or techniques to estimate probability density functions from 
, a discrete sample of events. Instead, we compute the likelihood ratio exactly over the full phase space, which implies 
^ • that this expected significance is an upper bound. This maximal significance indicates if a more detailed study with 
00 I a full detector simulation is warranted and provides a target significance to which any experimental study can be 
> compared. 

, To demonstrate the power of this method, we consider the production of the Standard Model Higgs boson at the 
LHC via weak-boson fusion with a subsequent decay to muons. Weak-boson fusion production of a Higgs boson with 
a subsequent decay to tau leptons originally proposed in Ref. [H has been firmly established by Atlas and CMS as the 
' main discovery channel for a light Higgs boson in the Standard Model as well as in its supersymmetric extension . 
While QCD effects can be a danger for most LHC analyses, additional jet radiation turns into a useful tool in the case 
of weak-boson fusion signals Q . Observation of the same process with a decay to muons can experimentally confirm 
I ' Yukawa couplings and their scaling with the masses for non-third-generation fermions. 

The expected significance of a search for H — > ^/i was estimated for weak-boson fusion Q and gluon fusion Q 
^ ■ production modes. For a 120 GeV Higgs boson mass, the best kinematic cuts found in Ref. Q result in a 1.8 a 
significance. The authors of that analysis note that many observables display additional discriminating power and 
^ ' suggest that neural networks or other multivariate procedures could enhance the sensitivity. Using our new method 
k>( ' we find that the maximum possible (target) significance for H — s- /i/i is much higher, i.e. the cut analysis can indeed 
5—1 ' significantly improved. 
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A. Neyman— Pearson Lemma 
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Our approach is based on the Neyman-Pearson lemma: the likelihood ratio is the most powerful variable or test 
statistic for a hypothesis test between a simple {i.e. havingno free parameters) null hypothesis — background only 
— and an alternate hypothesis — signal plus background [7|. Maximum power is formally defined as the minimum 
probability for a Type II error (false negative) for a given probability for a Type I error (false positive) . If we assume 
that the signal-plus-background hypothesis is true, the most powerful method has the lowest probability of mistaking 
the signal for a background fluctuation. 

The Neyman-Pearson lemma is commonly used to claim optimality, but these claims can be misleading. The 
reason is that the probability density function (pdf) of a multi-dimensional observable x for a given hypothesis is 
not experimentally known. Instead, experimentalists typically use a discrete sample of events {xi} to approximately 
estimate the pdf 8j. In practice, the size of the sample limits the dimensionality of the pdf that can be estimated 
to one or two dimensions, or it requires one to neglect correlations among the observables - both of which invalidate 
strict claims of optimality. In contrast, in phenomenology we can use the parton-level transition amplitude for a 
process (at a given order in perturbation theory) to exactly compute the pdf over the full phase space. 
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Two main ingredients are needed to calculate the distribution of the likelihood ratio for the background-only and 
signal-plus-background hypotheses. First, we have to evaluate identical sets of phase space points for signal and 
background processes, which is not part of standard Monte Carlo event generators. Secondly, we need to bootstrap 
the likelihood ratio distribution for one event to the distribution for a fixed luminosity including Poisson fluctuations. 
Both ingredients are discussed in the next Section. We then consider an example: a light Higgs boson produced 
via weak-boson fusion and decaying to muons. To achieve a minimum level of realism, we generalize our method to 
include experimental resolutions and detector effects. 

It should be noted that this work builds on several techniques used in experimental analyses, but it extends that work 
and applies it in a phenomenological context. For instance, the literature is replete with measurement techniques that 
use - to varying degrees - the matrix element to describe kinematic distributions [Tol . [Til [T^ . A qualitative distinction 
of this work is that we are estimating the sensitivity of a search for a hypothesized particle instead of measuring a 
theoretical parameter with data {e.g. the mass of the top quark or the helicity of the W boson). In particular, we are 
not trying to identify the maximum-likelihood estimator for a parameter to be extracted from data. The process of 
evaluating the likelihood of an event in real data is significantly different from constructing hypothetical data sets, and 
this leads to significant differences in the implementation of the algorithms (in particular, the two main ingredients 
mentioned in the previous paragraph) . Our approach to the incorporation of experimental resolutions is very similar 
to the recent work at the Tevatron, generically referred to as "matrix element method" [l^, [3, and we try to 
use similar notation and terminology to make the correspondence clear. Furthermore, we build on the statistical 
techniques {e.g. Eqs.^SS^ used in the LEP Higgs working group [9], which generally has not been matched with the 
matrix element method. 

In short, our method is a novel combination of the LEP statistical formalism with parton-level transition amplitudes 
used to define and compute a mathematically well defined maximum expected significance. Note that we do not 
attempt to identify any powerful discriminating observables, nor do we attempt to compute an observed significance 
based on experimental data 17]. Instead, we formulate and answer the question: what is the maximum expected 
significance of a potential physics signal, e.g. a Higgs decaying to muons? 



We first limit ourselves to a signal process and its irreducible backgrounds, i.e. signal and background processes 
with identical degrees of freedom in the final state, distinguished by (kinematic) distributions. To compute the 
expected signal and background rates we integrate the matrix elements squared over the phase space, with or without 
(acceptance) cuts, using a Monte Carlo integration. This method probes the phase space with random numbers. 
Ideally, the dimension of the random number vector r is given by the number of degrees of freedom in the final- 
state momenta after all kinematic constraints. The random number vector forms a (minimal) basis for all final-state 
configurations. We can schematically write 



where the phase space boundaries are included in the integral, and the differential cross section da{r) includes all 
phase space factors and the Jacobian for transforming the integration to the random-number basis. The integration 
over the parton distributions is included in the phase space integral. The measurement function M can be used to 
include additional cuts or to incorporate event weights {e.g. particle identification efficiencies) as a function of any 
observable. Removing unwanted parts of the phase space through cuts on observable quantities consistently removes 
the contribution of these phase space regions from all proccesses. Because the random numbers parameterize the entire 
phase space, all potentially available information about the process is included in the array of event weights {M da){f). 
Note that this phase space integration above is written assuming a simple cross section expression da; however, it can 
be replaced with any combination of differential cross sections which modern parton-level event generators predict. 

A cut analysis defines a signal-rich region bounded by upper and lower limits on observables and then counts events 
in that region. Ultimately, the variable that discriminates between signal and background — the test statistic — is 
simply the number of events observed in this region. Predicting the expected number of background events b and 
signal events s enables us to adjust the cut values which optimize the experimental sensitivity. More sophisticated 
techniques use multivariate algorithms, such as neural networks, to define more complicated signal-like regions, but 
the test statistic often remains unchanged. In all of these counting analyses, the likelihood of observing n events 
assuming the background-only hypothesis is simply given by the Poisson distribution Pois(n|6) = e^^ 6"/n!. 



B. Likelihood Ratio and Discovery Potential 
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There are extensions to this number counting, assuming we know the distribution of a discriminating observable 
X (which may be multi-dimensional). We assume that for the background-only hypothesis Hq this distribution is 
fb{x), while for the signal-plus-background hypothesis Hi it is fs+b{x) — [sfs{x) + bfb{x)]/{s + b) assuming no 
interference. Following the Neyman-Pearson lemma, the most powerful test statistic is the likelihood ratio for the 
entire experiment's data. The total likelihood for the full-experiment observable x = {xj} can be factorized into the 
Poisson likelihood to observe n events, and the product of the individual event's likelihood f{xj): 

_ L(x|i/i) _ Pois(n|s + 6) JX^iUbix,) _ f s + by ]X=ifs+b{xj) 

y (xj 



L{^\Ho) pois(ni&) n;=i/fc(a^,) V b J n;Li 

sfs{Xj)^ 



„x).1.0(x,.-„ + gln(^l + j^j (2) 

We compute the normalized probability distributions f{x) from the parton-level matrix elements. This way we con- 
struct a log-likelihood ratio map of all possible final-state phase space configurations using the normalized probability 
distributions da{f)/atot for the signal and background hypotheses: 

g(f) = -atoM/: +ln(l + ^^) (3) 

C is the integrated luminosity. To construct the single-event probability distribution pi, (,((?) we combine the back- 
ground event weight with the log-likelihood ratio map q{f) from Eq.([3]), which in general is not invertable: 

PlM = I dr 6 {q{r) - q,) (4) 

For multiple events, the distribution of the log-likelihood ratio pn.b can be computed by repeated convolutions of 
the single event distribution. This convolution we can either perform implicitly with approximate Monte Carlo 
techniques [ll] , or analytically using a Fourier transform [l^ . 

The expected log-likelihood ratio distribution for a background including Poisson fluctuations in the number of 
events takes the form pb{q) = X^n ^ Pn,b{q)- To compute this pb{q) from the single-event likelihood pi,b{q) 

given by Eq.([5]) we first Fourier transform all p functions into complex-valued functions of the Fourier conjugate of 
likelihood ratio, e.g. 'pTJ^{q). The Fourier-transformed n-event likelihood ratio is now given by = (pT^)" equivalent 
to a convolution in g-space. The sum over n in the formula for pb{q) now has a simple form in the Fourier domain: 
Ph = exp[6 (pi.b — 1)]. For the signal-plus-background hypothesis we expect s events from the pi,s distribution and 
b events from the pi b distribution. Similar to the above formula we have Ps+b = exp[6(pij^ — 1) -I- s(pi^ — 1)]. This 
form wc can transform back and obtain the log- likelihood ratio distributions pb{q) and ps+b{(l)- 

Given a log-likelihood ratio q we can calculate the background-only confidence level, CLf,: 

/■OC 

CU{q) = / dq' pb{q') (5) 



To estimate the discovery potential of a future experiment we assume the signal-plus-background hypothesis to be 
true and compute CLb for the median of the signal-plus-background distribution q*^^. This expected background 
confidence level can be converted into an equivalent number of Gaussian standard deviations and the significance 
written as Z ct by implicitly solving GLbiql+b) ~ ^ erf (Z/\/2) /2 for Z. 



C. Higgs Decay to Muons 

To determine the maximal significance in a strict sense we should not include detector effects which always decrease 
the significance. However, in our example of weak-boson-fusion H — > pp the experimental resolution on the invariant 
mass m^^ is much larger than the Higgs width: about 1.6 GeV for CMS and 2.0 GeV for Atlas [20]. To obtain 
a semi-realistic result we introduce a Gaussian smearing for m^^ into Eq.((T]). This Gaussian shape is just a simple 
numerical choice and could be replaced with any other smearing prescription or fast detector simulation. We convolute 
our momentum smearing with the Breit-Wigner-shaped Higgs propagator; in our case, the combination is completely 
dominated by the much larger Gaussian width. 
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Figure 1: Normalized pb{q) and ps+b{q) distributions, corresponding to the full-experiment log-likelihood ratio in Eq.©. These 
distributions define the expected significance. 

We introduce a new random number corresponding to the smeared Ti*^ and integrate over a transfer function 
from the true m^^ to the smeared m* by aligning one of the original random numbers with m^^: 



The original random number vector r is split into f ~ {r±,rm}- In our case, the transfer function 11^ is a normalized 
Gaussian giving the likelihood to reconstruct m*^ given the true m^^ and the experimental mass resolution. We 
trivially get back Eq.© for W{rm,r;^) ^ 6{rm - r^^)- 

In general one must be careful about the mapping between parton-level quantities and their observable counter- 
parts. As in most experimental analyses that use the matrix element method, the jet direction is assumed to be 
well-measured. We do not include a jet energy scale in the transfer function W, because, unlike the top-mass 
measurement with a hadronically decaying resonance, the jet energy scale is not a dominant experimental issue for 
this search. In particular, the jet momenta have relatively flat distributions, i.e. their variation on the scale of detector 
effects is small [11 . In the general case, one should consider all permutations between out-going quarks and gluons 
with jets; however, in the case of weak boson fusion, the signal-like regions of phase space have more than three units 
of pseudorapidity separating the tagging jets, which makes the association of parton to jet unambiguous. In other 
words, adding the alternative jet-parton assignment would give a negligible contribution to the event weight. The 
correspondence of the muons is also clear due to their charge. 

From Eq. ^ it is obvious how to include an experimental mass resolution: we replace the event weights (M da) by 
the integral (M J dr,n da W) and evaluate them over the smeared phase space {r±, r^}. Because the random numbers 
form a (minimal) basis for all final state configurations there is no 'back door' for the true (infinitely well measured) 
to enter the likelihood calculation. A rough approximation to incorporating the m^^ mass resolution could be 
an increased physical Higgs width. It replaces the Gaussian smearing with a Breit-Wigner function; we compared 
this approximate method with the proper smearing procedure and found that the difference in the final results was 
small but not negligible. 

For all details of the signal and background simulation (using CTEQ 5L parton distributions) we refer to Ref. [H]. 
There, after very basic cuts the signal cross section for a 120 GeV Higgs is 0.22 fb, hidden under 0.33 fb of electroweak 
Z production and 2.6 fb of QCD Z production, where the Z decays into muons. All other backgrounds combined 
contribute less than 0.01 fb, which allows us to neglect them. It is worth mentioning that the electroweak Z production 
consists of as many as 48 diagrams for a fixed flavor configuration, which is substantially more complicated than 
the search for single top production [l^. Conservatively assuming no additional information from higher-order jet 
radiation, we could apply K factors to the signal and cross section rates, but this would lead beyond this proof-of- 
principle letter. 

To probe the likelihood ratio over the full phase space, we relax the cuts for a 120 GeV Standard Model Higgs 
to mere acceptance cuts. All cross sections are finite, so the cut values have no effect on the likelihood we obtain. 
Using 2^° points we integrate over the final-state phase space projected onto the log-likelihood ratio q{r) according 
to Eq.Q. The phase space points used for this integration are defined by the same grid we use for the integration 
over the signal and background amplitudes described in Eq.®; this way we can check the total rates to ensure that 
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Figure 2: Muon invariant mass distribution for the 120 GeV Higgs signal and Z+jets background with acceptance cuts only 
(upper curves) and after a cut on the log-likelihood ratio q{f) > —1.5 (lower curves). The curves correspond to CMS and 
illustrate that events with high q{f) have an increased signal purity and signal-like characteristics. 

the likelihood integration covers the entire phase space. For each phase space point we integrate over the true to^^ 
as shown in Eq.Q, using a proper phase space mapping. Note that this internal integration does not have to use the 
same grid for signal and background. 

The resulting log- likelihood distributions pb{q) and Ps+bio) are shown in Fig.[TJ From the background pdf we extract 
the signal significance for an integrated luminosity of 300 fb~^ as 3.54 a for CMS and 3.19 a for Atlas. Note that 
this significance estimate neglects theoretical uncertainty in the overall rate since the signal has small higher-order 
corrections [2l[ and the normalization of the background will be well measured with 300 fb^^ of data. Also note 
that this significance does not include a minijet veto because only two jets are included in our parton-level transition 
amplitude; in principle, the same procedure could be repeated with a higher-order tree-level or a next-to-leading 
order calculation. Following Ref. [5| we can estimate the effect of a minijet veto, which increases the significance to 

4.4 (T for CMS. Survival probabilities for the veto neglect pile-up effects, which will degrade the enhancement in 
significance. Combining both experiments the significance even without a minijet veto is 4.77 a. 

The most relevant kinematic distribution is the reconstructed Higgs mass m^^ . In the upper curves of Fig. [5] we 
show it for signal and backgrounds without kinematic or likelihood cuts. The signal shows a smeared mass peak, while 
the backgrounds are flat. To illustrate how the method isolates signal-rich phase space regions, we apply a likelihood 
ratio cut q{r) > —1.5. Roughly a third of the signal events survive this cut, and each of the backgrounds are reduced 
to a rate comparable to the signal. After the likelihood cut the backgrounds show the same kinematic features as the 
signal, i.e. a peak in m^^. 

D. Detector Effects and Reducible Backgrounds 

The procedure for incorporating detector smearing on observables described above is tailored for smearing of a few 
observables, which are isolated in the phase space integration. Nevertheless, it is possible to generalize the smearing 
procedure. In essence, a complete detector smearing requires an integration over a fixed set of experimental observables 
with a nested integration over the remaining degrees of freedom in the phase space. The latter include the unsmeared 
(true) observables, as shown in Eq.®, as well as the unobservable longitudinal component of neutrino momenta at a 
hadron collider or the momentum of particles not passing the acceptance cuts. As mentioned in the previous section 
and discussed in the literature related to the matrix element method, one should take care to include in the transfer 
function all relevant detector effects and consider all permutations that arise from ambiguities in the mapping from 
parton-level quantities to their final state observables. 

We usually include detector effects by smearing all final state four-momenta; however, this can be computationally 
inefficient. If we instead choose not to smear some of the observables, we must remain vigilant to insure that there 
is no 'back door' through which four-momentum conservation together with unsmeared observables implicitly evade 
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smearing. We avoid this 'back door' explicitly in Eq.® by factorizing the basis of the phase space into orthogonal 
components and r^. 

After generalizing our method to smear multiple observables we can now incorporate reducible backgrounds, 
i.e. background whose final-state configurations have more degrees of freedom than the signal. We simply pick a 
set of observables that is common to all signal and background processes, and marginalize the additional background 
degrees of freedom. Flavor tagging efficiencies and fake rates can be included in the event weights through W . In 
these scenarios, the interpretation of the resulting significance is more vague: it is the maximal significance given the 
specified set of observables and the assumptions in the transfer and measurement functions. 

E. Conclusions 

We have described a way to compute the mathematically strict maximum significance for a set of signal and 
background processes at the parton-level. Our method is based on the Neyman-Pearson lemma and can be used to 
decide if a new physics search at high-energy colliders has a sufficiently large discovery potential to justify a dedicated 
analysis. 

While our example is fairly simple, including only irreducible backgrounds and incorporating experimental resolution 
for only a single observable, we have outlined the extension of the method to include general detector effects. This 
approach to including detector effects follows closely the recent experimental work at the Tevatron referred to as 'the 
matrix element method'. The next step will be to implement this likelihood computation into a parton-level event 
generator with a simple and fast simulation of detector effects [l^l ■ 

Weak-boson-fusion production of a Higgs boson with a subsequent decay to muons is the perfect showcase for this 
new method: it suffers from very low signal rate and from the lack of distributions that clearly distinguish signal from 
background. A verybasic cut analysis in Ref. Q quotes a significance of 1.8 a for 300 fb~^ for a single experiment. 
In particular, Ref. [a] found that a cut analysis was likely not the best-suited strategy for this signal. Applying our 
method we arrive at a possible maximum sig nificance of 3.54 cr (CMS with 300 fb"'). By increasing the complexity of 
the final state, higher-order QCD effects can be exploited using a minijet veto which could increase the significance 
to ~ 4.4 cr. Not only is this result grounds for a more careful study by the experimental collaborations, but it also 
indicates that without a luminosity upgrade Atlas and CMS combined may be able to observe the decay H nfi. 
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